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Abstract 


Approximation  methods  j or  the  1 iniirtum  average  cost  per  unit  time 
problem  with  a controlled  diffusion  n'n.i.'i  is  treated.  In  order  to 
work  with  a bounded  state  space,  we  use  the  reflecting  diffusion 
model  of  Strook  and  Varadhan,  although  other  models  can  also  be 
treated.  The  control  problem  is  approximated  by  an  average  cost 
per  unit  time  problem  for  a Markov  chain,  and  weak  convergence 
methods  are  used  to  show  convergence  of  the  minimum  costs  to  that 
for  the  optimal  diffusion.  The  procedure  is  quite  natural  and  al- 
lows the  approximation  of  many  interesting  functionals  of  the 
optimal  process. 


1.  Introduction.  In  this  paper,  we  develop  an  approximation  and 
computational  approach  to  a particularly  difficult  class  of  sto- 
castic  control  problems.  The  computational  problem  leads  to  the 
approximation  of  the  original  process  and  optimization  problem  by 
an  interesting  and  simpler  sequence  of  processes  and  optimization 
problems,  which  yields  much  information  on  the  original  optimal 
process . 

Let  w(*)  denote  an  R^-valued  Wiener  process,  let  ^ denote  a 
compact  set  and  define  the  bounded  and  continuous  functions 
f ( • , • ) : Rr  * ^ - Rr ; k ( • , • ) : Rr  * - R;  o ( • ) : Rr  + r * r 

matrices.  Let  x(*)  denote  a non-anticipative  solution  to  the 
Ito  equation 

(1)  dx  = f (x,  u)  dt  + a (x)  dw, 


where  u(*)  is  a non-anticipative  (always  with  respect  to  w(O) 
‘ii? -valued  progressively  measurable  control  function.  For  typo- 
graphical simplicity  we  sometimes  write  xg  for  x(s),  etc.. 
Define  YU ( • ) by 


(2) 


YU(x) 


lim  E 
t-+°° 


u 

X 


1 

t 


rt 

Mx 
> 0 s 


ug) ds , 


where  EU  denotes  the  expectation  when  x = x and  control  u(-) 
X u 

is  used. 

We  are  interested  in  finding  good  approximations  to  the  infimum 

7 of  YU(x)  over  all  controls  u(*)»  and  to  the  optimal  control, 
and  also  other  information  concerning  the  optimal  trajectory,  in 
cases  where  YU(x)  does  not  depend  on  the  initial  state  x. 
Furthermore,  we  want  to  be  able  to  compute  the  approximation  and 


obtain  the  additiona.1  'nfonnati  n 1 i>  Lng  practical  computational 
methods . 

A number  of  difficulties  stand  in  * he  way  of  a practical  computa- 
tion. First,  the  state  space  Rr  of  x(*)  is  unbounded  and  the 
control  problem  (1)  - (2)  will  have  to  be  modified  so  that  the 
state  space  is  bounded.  This  is  a particularly  ticklish  point, 
since  we  want  a modification  which  yields  usable  information  con- 
cerning the  original  problem.  In  particular  situations,  a great 
deal  of  attention  must  be  devoted  to  this.  For  definiteness,  we 
use  the  bounded  process  defined  in  Section  4,  although  many  others 
are  possible.  Next,  we  have  not  assumed  very  much  about  the  system 
(1).  If  yU ( • ) actually  depends  on  x,  then  very  little  is  known 
about  the  problem.  Fortunately,  for  many  problems  (perhaps  the 
most  important  ones)  we  can  restrict  attention  to  u(*)  which  are 
stationary  (u(*)  is  a stationary  process),  or  to  the  stationary 
pure  Markov  case  (where  ufc  = u(xfc)).  Even  then,  the  solution  to 
(1)  may  not  be  unique.  In  practical  problems,  it  is  often  demanded 
that  the  system  have  a certain  robustness . Criteria  such  as  (2)  are 
of  interest  when  the  system  is  to  operate  over  a long  period  of 
time,  usually  of  uncertain  duration  and  with  an  uncertain  initial 
condition.  It  is  usually  desired  that  the  control  be  stationary 
pure  Markov,  and  that  for  the  controls  u(')  in  the  class  which 
are  to  be  considered  there  be  an  invariant  measure  yU,  and  the 
measures  of  x(t)  tend  to  pu  as  t -*•  " for  eacu  x = Xq.  In 
certain  cases  (e. g. , Kushner  [1])  one  can  restrict  attention  to 
such  controls.  In  general,  little  is  known  about  the  continuous 
parameter  problem,  and  many  of  the  difficulties  in  the  way  of 
establishing  convergence  of  a computational  procedure  are  due  to 
this.  Also,  it  is  usually  hard  to  approximate  problems  over  an 
infinite  time  interval,  unless  the  approximation  and  limit 
processes  are  stationary.  Furthermore,  the  ergodic  subsets  for 
each  approximation  may  depend  on  the  approximation.  In  any  case, 
the  procedures  to  be  developed  here  are  very  natural,  provide  much 
information,  and  do  give  the  desired  results  under  broad 
conditions.  We  will  later  make  an  additional  assumption  on  the 
system. 

Our  approach  follows  the  ideas  in  Kushner  [2],  [3]  and  Kushner 

and  DiMasi  [4] . The  problem  (1) , (2)  is  approximated  by  a control 
problem  on  a Markov  chain  (with  approximation  parameter  h) , and 
weak  convergence  methods  are  used  to  show  that  certain  interpola- 
tions of  the  sequence  of  approximating  chains  converge  weakly  to  an 


ot  x«\al  process.  The  11  (!>'•)  ie.  s a m it  deni  of  information  on 
the  optimal  process;  e.j.,  invar;  r.t  ares  and  joint  distribution; 

A formal  dynamic  programming  approa  ’ o the  optimization  of  (1), 
(2)  is  given  in  Section  2,  Section  3 ai  .jues  for  a "computational 
approximation"  and  a bounded  state  space.  The  actual  form  of  the 
bounded  state  space  model,  the  Strook-Varadhan  model  of  a reflected 
diffusion  [5],  is  discussed  in  Section  4.  This  model  is  used  partly 
for  the  sake  of  specificity  and  partly  because  it  allows  us  to 
illustrate  some  interesting  features  of  the  weak  convergence  and 
boundary  time  scaling.  The  actual  discrete  state  model  is  developed 
in  Section  5 and  Sections  6 and  7 give  the  weak  convergence  results. 

2 . A Dynamic  Programming  Sufficient  Condition  for 
Optimality  for  (1) , (2) . 

Let  y n denote  the  differential  generator  of  (1)  : 


= ? fi(x'u)  3x“  + ,£.aij(x) 

1 1 1 ^ J 1 J 

a ( • ) - o ( • ) a( • ) 1 /2 . 


When  evaluating  -/UF(*)  at  t,w,  for  a (Rr)  function  F(’), 
set  x = x , u = ufc.  Suppose  that 
V ( - ) and  a constant  Y such  that 


2 IT 

ut»  Suppose  that  there  is  a C (R  ) function 


(3) 


inf  [y  V(x)  + k(x,ot)  - y]  = 0, 
u 


where  y is  now  treated  as  a parametrized  operator.  If  there 
is  a Borel  function  u(*)  on  Rr  such  that  c*  = u{x)  minimizes 

jr 

at  x in  (3)  for  each  x e R , and  to  which  there  corresponds  a 

,u 


process  (1)  such  that  E^Vix^J/t  ■+■  0,  then 
(4a) 


— ft 

Y = lim  — EU  k(x  , u )ds. 


t-*°°  fc  x J 0 “ “ 

If,  in  addition,  v(*)  is  any  ^-valued  non-anticipative  (w, t) 
progressively  measurable  function  (henceforth  called  a control) 
corresponding  to  which  there  is  a solution  to  (1) , and  if 

i EVV(x. ) - 0,  then 
t x t 


(4b) 


1 V 

Y < Um  i E 

t-*°o  ^ 


ft 


k (x  ,v  ) ds, 

S 5 


M,SBEStW«.t«ra‘0IICA‘“ 

2S35SS->»« 


and  u(-)  is  optima.!  • '•!  spc  t 1 ’ •"  :h  '•  ( • ) in  the  sense  that 

YU.<  YV  for  any  x^  e ' 1 her  fi>  d o.  ndom.  Under  u(*}  cr  v ( * i - 
(1)  is  homogeneous,  but  there  is  not  1 < essarily  a unique  invariant 

measure . THIS  PAGE  IS  BEST  QUALITY  PBAGIICAfil*! 
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3.  Bounded  State  Space  Approximations.  The  approximation  and 
computational  method  developed  in  [2]  is  roughly  as  follows.  Let 
u(*)  be  fixed,  and  let  it  be  a function  only  of  the  state  x.  We 
derive  a family  (parametrized  by  h)  of  Markov  chains.  For  fixed 
u(- ) , the  sequence  of  (suitable)  continuous  parameter  interpola- 
tions of  the  chains  converge  weakly  to  the  solution  to  (1) , as 
h -*■  0,  under  broad  conditions.  For  each  h,  we  have  a controlled 
(indexed  by  u ( * ) ) family  of  Markov  chains.  Optimize,  using  the 
appropriate  Markov  chain  version  of  (2) , and  obtain  the  minimum 
value  function  for  each  chain. As  h -►  0,  the  sequence  of  minimum 
values  converges  to  the  infimum,  over  a large  class  of  comparison 
controls,  of  the  value  function  of  the  original  problem.  Also, 
many  properties  of  the  approximations  converge  to  similar 
properties  of  the  limiting  optimal  process. 

Since  our  interest  is  in  feasible  computations,  as  well  as  in 
convergence,  it  is  necessary  that  for  each  h the  state  space  of 
the  approximating  chain  be  finite.  This  requirement  necessitates 
revision  of  the  original  system  (1) . The  following  are  among 
several  possibilities  that  can  be  dealt  with. 

(i)  The  state  space  may  be  naturally  bounded,  in  that  there 
are  bounded  sets  GQ'Gg  such  that  if  x^  £ gq'  then  xfc  e G^  for 
all  t and  all  u ( • ) . 

(ii)  If  Xq  & Gq,  then  the  approximating  Markov  chain  remains 
in  G^,  for  all  h,  under  the  optimizing  controls. 

(iii)  Impulsive  control  terms  (12),  Chapter  8)  are  added  to  the 
cost  function,  such  that  the  state  is  guaranteed  to  be  "impulsively" 
driven  into  Gq,  if  it  ever  leaves  G^. 

(iv)  A bounded  set  C.  can  be  introduced,  such  that  xt  is 
not  allowed  to  leave  G = G + 3G.  To  guarantee  this,  a suitable 
boundary  process  is  introduced  on  3G. 

For  concreteness  in  the  development,  a particular  form  of  (iv) 
will  be  dealt  with.  We  let  G be  a hyper-rectangle  and  suppose 
that  xt  is  reflected  from  3G.  A hyper-rectangle  is  chosen  only 
to  simplify  the  specification  of  the  approximation  on  the  boundary. 
Any  region  for  which  a specification  with  the  proper  convergence 
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4.  The  Submartingale  Fitblem  of  Stro  ')  ."id  Varadhan  [5]  in  G. 

In  order  to  assure  ourselves  that  th  reflection  is  well  defined, 


assume 


(Al)  for  each 
planes  of 
where 


i,  a.i(x) 


is  strictly  positive  on  the  boundary 


x . = 1 

l 


which  are  parallel  to 

-•  th  . * 

component  of  x . 


{x; 


x.  = 0} , 

l 


We  introduce  a boundary  control  and  cost  function.  Let 
a compact  set,  and  define  the  bounded  continuous  functions 


0 


Y ( 


3G 


R 


V 


°k 


o 


R;  P(-)  : 3G  - [0,1] 


Let  the  vector 
to  G for  each 


x 


point  strictly  interior 


) t 3G  ^ 

Y(x,a)  with  origin 

x t.  3G  and  u l . For  A C R^  , set 

x:  x £■  A},  let  x ( • ) denote  the  generic 

[ 0 ,°°)  ) as 

well  as  the  solution  to  (1).  Hopefully,  no  confusion  will  arise. 

Define  Cr  = Cr[0,«)  O lx(*):  x.  e G,  all  t < °° } and 
^ r . ^ 

$ = a-algebra  on  induced  by  the  projections  x 

t Lj  S 


I (x)  = indicator  of  set 

™ r r 

element  of  C [O,0")  (R  -valued  continuous  functions  on 


this  reflecting  diffusion, admissible  controls  u( 


s < t.  For 
are  ^-valued 


when  the  process  state  xfc  eOG,  and  are 
state++xfc  e 3G.  For+  q ( - , - ) £ C2,1(G  x 

u(-),  define  the  function  ( • , • ) on 


‘^-valued  when  the  process 


[0,*) ) 

C*[0,») 


and  admissible 
by 


(5] 


Fg (x ( • ) , t)  = q(xt,t)  - q (xQ , 0 ) - 


3 s 


+ if  ]q(x  ,s)Ir(x  )ds. 


For  the  moment,  let  u(-)  depend  only  on  the  current  state  x. 

— u r 

y £ G,  there  is  a measure  P on  C_  such 

2i_  y b 

P“(xQ  = y}  = 1 and  for  each  q{-,*)  in  C ' "" 

p(x)q  (x,t)  + Y ’ (x , u (x) ) q (x , t ) > 0 for  all 

u X 


Suppose  that  for  some 
.u 


the  process 


{F“<-,t), 


(G  x [0,°°))  for  which 
x e 3G,  and  all  t > 0, 
u 


is  a submartingale.  Then  P 


y 


IS 

If, 

on  G , 


said  to  solve  the  submartingale  problem  for  initial  value  y. 
in  the  above,  the  vector  y can  be  replaced  by  a measure  v 

0 

and  p“  {Xq  e 1}  = vq(H  for  each  Qorel  set  T,  then  P^  is  said 

to  solve  the  submartingale  problem  for  initial  measure  v^. 

If  u(«)  depends  only  on  the  current  state  x,  then  the  solution 


2 1 

C ' is  the  set  of  uniformly  bounded  continuous  functions  on 
G x (0,°°)  whose  derivatives  up  to  second  order  in  x and  first 
in  t,  are  continuous  and  uniformly  bounded. 


++  and 


u, 


is 


measurable . 


tj  the  submartingale  p.  '■!  ve;  th* • ■ 1 : red  * c fleeted  diffusion, 

and  y(x,u(x))  is  the  j.age  "direct  ; of  reflection'  at  x t 8G, 
and  p (x)  is  a scale  factor  which  det-  unnes  the  relative  time  that 
x ( • ) spends  on  OG  ([2],  [3],  [5]).  ~ince  P(>)  only  affects  the 
time  scale,  and  not  the  costs  ([3],  [2],  Chapter  10),  for  our 

modelling  purpose  it  is  sufficient  to  set  P(x)  ~ 1,  which  we  will 
do . 

Let  solve  the  submartingale  problem.  There  is  a non- 

decreasing scalar  valued  process  P(*)»  which  only  increases  when 
xfc  £ 3G,  and  is  such  that  for  the  above  q(*,-) 


(6)  F (x(-),t)  - j [q  (x  ,s)  + Y' (x  ,u  )q  (x  ,s)]dy 

q JqSS  ssxs  s 

is  a martingale  (with  respect  to  {P  , }).  Furthermore,  there 

is  a standard  Wiener  process  w ( • ) such  that  under  P ., 

y 

(x ( • ) , u ( • ) , y ( * ) ) are  non-anticipative  with  respect  to  w(*)  and 

w. p. 1 . 


(7) 


Xt  = Y + J 


rt  ft 

f(x  u )I  (X  )ds  + • o(x  )I  (x  )dw 
0 ssos  JQ  SGS  s 


rt 


+ ( I (x  )y(x  ,ujdp  . 
. cG  s s s s 


For  the  control  problem,  we  may  wish  to  deal  with  a larger  class 
of  (admissible)  controls  than  the  stationary  pure  Markov  class. 

We  can  still  speak  of  a solution  to  the  submartingale  problem,  but 
then  the  measure  PU  or  Pu  must  be  defined  on  the  appropriate 

y vo 

o-algebra  on  the  product  space  of  C„  and  the  path  space  for  the 
control  process.  If  this  extended  submartingale  problem  has  a 
solution,  then  the  non-decreasing  process  U(-)  and  Wiener  process 
w(-)  will  still  exist  and  (6),  (7)  hold. 


A modified  control  problem.  Suppose  that  there  is  a solution  to 
the  submartingale  problem  corresponding  to  admissible  control 


u(*)>  and  initial  condition  y 

r t 


Define  y (y)  now  by 

(8)  Yu (y)  = Tim  ^ E^{  k (x  , u ) I (x  ) ds  + 

t-co  tyjQ  ssGs  Jq 


1 r.U  r [ 


k0(Xs'Us)I8G(Xs)dWs: 


+To  construct  the  Wiener  process  »(•),  we  may  have  to  augment  the 
probability  space  by  adding  an  independent  Wiener  process. 


Sin.,  P = 1,  we  can  set  j.;  = s.  The  formal  dynamic  programmi.. 

equation  (3)  is  replaced  by 

inf  [j/"UV(x)  + k ( x , < ) - y]  = 0,  x c G, 
a 

(9) 

inf  [V'(x)y(x,a)  + kA(x,u)  — Y ] = 0 , x £ 3G, 
a x.%  x 0 

where  V(-)  is  now  assumed  to  be  bounded.  If  there  is  a solution 
to  the  submartingale  problem  corresponding  to  admissible  control 
v(*)  and  initial  condition  y,  and  also  a smooth  function  V ( • ) 
and  constant  Y solving  (9),  then 

(10)  7 < i V (y ) . 

If  there  is  a Borel  admissible  control  u ( • ) which  attains  the 
infimum  in  (9)  , and  for  which  the  submart ingale_problem  has  a solu- 
tion for  each  initial  condition  x,  then  Y = \U(y)  and  u(*)  is 
optimal.  We  emphasize  that  although  (9)  will  serve  as  the  basis  of 
our  approximation,  it  need  rot  have  a solution  of  any  sort  for  our 
method  to  be  valued. 


5.  Discretization . There  are  a number  of  techniques  for  getting  an 
approximating  sequence  of  Markov  chain  control  problems  with  the 
correct  convergence  properties.  We  use  the  method  in  [2]  mainly 
because  it  is  relatively  straightforward,  fairly  well  understood 
and  we  can  refer  to  existing  results.  The  method  is  based. on  a 
finite  difference  approximation  with  difference  interval  h.  A 
particular  (but  natural)  finite  difference  approximation  to  (9)  is 
used.  It  makes  no  difference  whether  or  not  (9)  has  a smooth  solu- 
tion, for  the  finite  difference  approximation  is  not  used  to 
solve  (9).  After  a suitable  rearrangement,  the  coefficients  of 
certain  terms  in  the  finite  difference  approximation  will  be 
transition  probabilities  for  an  approximating  controlled  Markov 
chain.  This  is  the  only  use  to  which  (9)  will  be  put.  The  method 
gives  us  an  approximating  chain  simply  and  automatically.  A 
detailed  outline  of  the  method  and  of  some  of  the  convergence 
properties  will  be  given,  but  many  of  the  details  which  can  be 

found  in  the  basic  references  [2],  [3],  [4]  will  be  omitted. 

t h 

Let  e^  = unit  vector  in  l coordinate  direction,  and  assume  for 
convenience  that  each  side  of  G is  an  integral  multiple  of  h. 


Leu  o,  denote  the  finite  difference  grid  on  G,  and  sot  9G, 
h h 

G,  - G,  , where  G,  is  the  finite  difference  grid  on  G.  Now,  let 
h - h h 

us  discretize  (9)  . On  9G,  use  the  approximation 


(x)  -►  [V(x+e^h)  - V(x)]/h,  if  Y^XjOt)  > 0 

V (x)  ■>  [ V ( x ) - V(x-e.h)]/h,  if  V.  (x,a)  < 0. 
xi  1 1 


In  G,  use  the  approximation 


V (x)  -*•  [V(x+e.h)  --  V(x)]/h,  if  f.(x,a)  > 0 
xi 

(12)  V (x)  - [ V (x)  - V(x-eih)]/ht  if  f.(x,u)  < 0 

l 

V , (x)  -*•  [V(x+e.h)  + V(x-e.h)  - 2V(x)]/h‘'. 
xixi 


The  approximations  for  V (x)  , i j*  j,  are  long,  and  the  reader 

XfXj 

is  referred  to  [2],  Chapter  6.2  for  one  set  of  possibilities. 

Simply  to  avoid  writing  these  down  here,  we  suppose  that.  o(x)c'  (x) 
is  diagonal.  This  assumption  is  not  required  by  anything  except 
our  current  laziness.  It  does  not  affect  the  outcome,  only  the 
precise  form  of  the  functions  Q^(*,*)  and  p (•,•)  introduced 
below. 

Define  (x, • ) , Atn  (x)  and  Q^(x)  by 

Q.  (x,u)  = h l | f . ( x , a ) j + l c2 (x)  , x e G , 
n l • i n 


Qh(x,a)  = l |li(x,a) 


x t oG 


Qj  (x)  = sup  Qh (x,a) , 

u 


(where  a ranges  over  the  appropriate  set  ^ or 


) , 

0 


At  (x)  = h/Qh(x)  on  3Gh, 
2 — 

= h /Qh(x)  on  Gh- 


Approximating  the  derivatives  in  (9)  by  (11) -(12)  and  rearranging 

j, 

terms  yields  the  following  equation,  where  v and  V (•)  are 
used  to  denote  the  solution  to  the  discretized  equation  and  we  use 
the  definitions  g+(x)  = max fg (x) , 0]  and  g (x)  = max [0 , -g (x) ] . 


1- 


(13)  h Y ^ - inf  [-Q.  (x,u ) v'1  (:<)  £ V^(xte-h)  (hf  . (x,u)  f u?(x)/2) 


i,± 


+ h k (x,a) ] , xt-Gh, 


hi h = inf  [-Q.  (x,u)vh(x)  + l Vh(x±e.h)YT  (x#a)  + hk.(x,a)l, 


wh 


a 


l » 


l l 


x t 3c.  . 
h 


hi  1 

Define  p (x,x±e.h!a)  = (coefficient  of  Vn  (x±e  . h)  ) /Q,  (x)  , 

h _ in 

p (X,x|a)  = [Qh(x)  - 0 (X,a) ]/Qh(x) . Divide  (13)  through  by 

Q^(x)  and  rearrange  to  get 


(14)  Vh(x)  + Y^At^(x)  - inr  ( £ V*1  (x±e  . h)  p*1  (x , x.f  e . h | a ) 

i,‘  1 1 

+ Vh(x)ph(x,x|a)  + k(x,u)Ath(x) ] , x t G.  , 

h 


and  similarly  for  x in  DG^,  where  ^ and  k are  replaced  by 
and  kQ,  resp.  Define  ph(x,y|a)  = 0 for  all  x,y  other 
than  y = x or  y = x ± edi  for  some  i.  Then  fph(x,y|a),  x, 
y e G^}  is  a transition  probability  for  a controlled  Markov  chain. 
Let  { 5^}  denote  the  random  variables  of  the  chain,  and  define 
^ ( x ) = ^ in  G,  and  'ik ( x ) = on  9G,  and  redefine  k(x,a) 

to  equal  k^(x,a)  for  x t 9G.  Then  (14)  can  be  rewritten  in  the 
form 


(15)  Vh(x)  + YhAth(x)  = inf  [EaVh(^)  + k (x,a)  Ath  (x)  ] , x <-  G.  . 

a£^(x)  x 1 h 

In  (13)  — (15)  , we  supposed  that  Y is  a constant.  This  is  almost 
equivalent  to  the  assumption  that  there  is  only  one  recurrence 
class  for  the  chain  under  the  optimal  control.  if  there  is  more 
than  one  recurrence  class,  the  numerical  problem  is  harder.  Let  us 
henceforth  assume 


(A2)  For  each  smali  h and  under  each  stationary  pure  Markov 
control,  there  is  only  one  recurrence  class. 

This  assumption  seems  to  hold  in  very  many  cases  of  practical 
interest.  It  can  be  dispensed  with,  but  then  the  problem  of 
actually  solving  (13)- (15)  is  much  harder.  Under  (A2) , (15)  can  be 

solved  by  either  Howard's  iteration  in  policy  space  for  semi-Markov 
processes,  or  by  a version  of  the  backward  iteration  method  for  the 


Schweitzer  and 


average  cost  per  unit  time  problem  (sec,  e.g., 

Federgruen  [8],  but  adapted  to  a semi-Markov  process  model).  There 
is  an  optimal  stationary  pure  Markov  control  ull(*)  Cor  all  small 
h,  it  is  the  minimizer  in  (15) , and  it  is  optimal  with  respect  to 
all  controls  for  the  discrete  problem.  The  "Semi-Markov"  point  will 
be  returned  to  below.  The  optimal  solution  is  given  in  the  first 
line  of  (19 ) . 

Discussion  of  (14).  For  y l g^,  we  have  for  any  stationary  pure 
Markov  control  u(-) 


(16a) 


1L. 


cov 


U , , h 

yKn+l 


u 

y 


^n^n  = USQd]  = f (y,u  (y)  ) Ath(y)  , 

^n^n  = y'  u ^ used]  = c (y)  c ’ (y)  A th  (y ) + 
+ o (Ath  (y) ) , x t Gh- 


For  y 


(16b) 


c DGh, 


EU[Ch 
y 1 ’n+l 


u r . h 

COVyt4n+l 


u ( • ) 

u ( • ) 


used]  = V (y ,u (y) ) At‘  (y) , 
used]  = o (A t11  (y)  ) . 


These  "infinitesimal"  properties  (derived  in  [2],  [3]),  together 

with  (15) , suggest  a close  relation  between  the  controlled  chain, 
and  the  controlled  reflected  diffusion. 

These  relations  are  brought  out  quite  clearly  when  the  chain  is 
suitably  interpolated  into  a continuous  parameter  process,  and  (15), 
(16)  suggest  several  useful  interpolations.  First,  we  note  that 
solving  (15)  is  the  only  computation  that  need  be  done.  Equation 
(15)  is  not  quite  the  dynamic  programming  equation  for  the  average 
cost  per  unit  time  for  the  controlled  chain  {4^}/  since  has 

a state  dependent  coefficient  Atn(*).  However,  it  is  the  dynamic 
programming  equation  for  a semi-Markov  process  or,  equivalently 
for  the  types  of  continuous  parameter  interpolations  which  are 
discussed  below. 

Let  t denotesthe  invariant  measure  which  corresponds  to  the 

optimal  control.  Henceforth,  unless  otherwise  mentioned,  {£;_} 

n 

refers  to  the  optimal  chain,  with  initial  measure  n . 

We  now  choose  an  interpolation  method  and  show  that  the  sequence 
of  interpolated  processes  converges  weakly  to  a solution  to  the 
submartingale  problem  corresponding  to  some  admissible  control 


u(*),  and  that  this  solution  is  an  optimal  one,  with  cost  rate 

Y = lim  7h. 

h+0 

Either  of  the  following  two  piecewise  constant  interpolations  will 
serve  our  purpose. 

} h > \ n- 1 , 

Interpolation  1.  Define  At1^.)  = At?,  t1  = £ At?.  Define  the 

i in  i=0  1 

semi-Markov  process  4 ^ ( • ) by  4^(t)  = on  [t*\t^  , ).  This 

J n r.  n+1 

interpolation  was  used  in  [2],  [3], 

Interpolation  2.  Let  4^(0  denote  the  Markov  jump  process  or. 

G^  defined  by: 

If  4^(t)  = y,  then  the  average  additional  time  spent  in  state  y 
before  a jump  is  At'fy),  and  P {next  state  = y'  J current  state  = y) 
= P^ (y , y ' ' u^ (y ) ) . There  is  a slight  ambiguity  here  since  it  is 
possible  that  p (y,y|u  (y) ) > 0.  But,  this  should  cause  no  con- 
fusion - for  it  simply  means  that  there  is  a jump  of  "zero" 
magnitude.  The  average  inter jump  times  can  be  normalized  to  avoid 
this,  but  it  hardly  seems  worthwhile.  Note  that 

r-  jump  in  (t,t+A]  |Ch(t)  = y}  = (A/At'^y))  + o(A). 

This  interpolation  is  developed  in  Section  8 of  [4]. 

Neither  interpolation  is  always  preferable  to  the  other.  Inter- 
polation 2 could  have  been  used  in  references  [2],  [3],  but  there 

did  not  seem  to  be  a need  for  it  then.  There  are  advantages  to 
having  an  interpolation  which  is  a continuous  parameter  Markov  chain 
in  that  certain  concepts  (such  as  stationarity)  have  a clearer 
meaning;  on  the  other  hand  it  is  sometimes  preferable  to  work  with 
interpolation  times  that  are  deterministic  functions  of  the  current 
state,  since  then  there  are  fewer  random  variables  to  worry  about. 
The  limiting  processes  (see  Sections  6 and  7)  are  the  same  for  both 
interpolations.  In  Case  2,  the  average  sojourn  time  in  a state  y 
(before  the  next  jump,  whether  of  zero  value  or  not)  is  At  (y) , 
precisely  the  interpolation  interval  for  Case  1.  In  both  cases,  the 
time  spent  at  a state  y on  the  boundary  (0(h),  per  sojourn)  is 
greater  than  time  spent  at  a state  y in  (0(h")  per  sojourn, 

unless  there  is  the  complete  degeneracy  c(y)  = 0).  This  property 
is  a consequence  of  our  definition  of  At  (y)  for  y t 80^ 


(to  correspond  to  P(y)  : 1). 

For  either  Interpolation  1 or  2 , 


(17)  Yh  = lim  f k(s\u^)ds/t, 

t-»  J0  3 S 

where  u*1  = u^(5^),  and  E^  indicates  that  u1  is  used.  The  in- 
s s x h 

variant  measure  for  either  interpolation  is  u , where 


(18a)  yh(y)  = A t h ( y ) 11  h ( y ) / 1 Ath(z)nh(z) 

z 


Also, 


(18b) 


h - l Uh(y)k(y,uh(y)  ) . 
y 


Equations  (17)  and  (18)  are  not  hard  to  verify.  For  example, 

(18)  follows  from  the  ergodic  theorems  for  Markov  chains  (see 

Chung  [6],  Section  1.15,  Theorems  1,  2,  3;  sec  also  [2], 

Chapter  6.8,  for  similar  calculations).  It  can  also  be  obtained 

by  direct  verification  of  the  Kolmogorov  equation  using  the  in- 
h 

variance  of  n ( • ) for  the  discrete  parameter  chain.  To  get  (17) 
write  uj  for  un(^*?)  and  use  (15)  and  the  same  ergodic  theorems 

to  get 


(19) 


= lim 

n-M» 


tEX 


i-l 


r , / • h h. 
I k ( t,  ■ , u . ) 

i-  0 


■\  t^/F^ 
Ati/Lx 


n-1 


y At.h] 


i = 0 


lim 

n 

( w . p . 1 ) 


. n r \ . - h h . . h , n r ^ . h . 
. I k (t.  . ,U  ) At ./  I At  . ] 

i=0  111  i=0 


h 


I ira 

n J 0 
(w. p. 1) 


.h  h. 


k (511,  u“)  ds/t“  = lim  f k(£*\u^)ds/t 


s s 


(w.p.l)  J 0 

£ -VOO 


= lim  | E^k  (4*\u^)ds 


J 0 


X s s 


/t. 


Similarly,  the  first  limit  in  (19)  equals 

(20)  yh  = l (y)  k (y  ,uh  (y) ) Ath  (y)  / l 1fh  (y)  Ath  (y) 


= l uh (y)k (y,uh  (y) ) . 

y 


Lee  v(’)  denote  a stationary  pure  Markov  control.  Then  (15) 
implies  that  (here  now  re  'or  to  the  variables  under  control 

v ( • ) ) for  any  x 


(21) 


V h < lim 

n-von 


,v  n/  h.  . h , - h,  . 
I Lt.k(c  ,v(t.. )) 

i=0 


v,  h 


n-1 

Ex  l 

i=0 


A th 
1 


The  proof  of  optimality  of  u (•)  with  respect  to  any  control  which 
is  not  necessarily  stationary  pure  Markov  can  be  based  on  a method 
of  Ross  [7]  and  is  omitted. 


6.  Weak  Convergence.  We  will  work  with  Interpolation  2,  since  it 

is  a strictly  stationary  process.  The  method  will  be  outlined,  but 

the  proofs  will  be  usually  referred  to  when  already  available 

elsewhere.  So  far,  we  have  a sequence  of  stationary  pure  Markov 

controls  {ul(-)},  corresponding  stationary  continuous  parameter 

li  hi 

Markov  chains  (•)},  invariant  measures  (u  },  and  minimum  costs 

{t  },  where 


V = I_  ’a  (y)k(y,un(y)  ) = l un  (y ) k (y  ,un  (y ) ) 

yeGh  yech 


I u (y) kp (y,u  (y) ) , 
yc3Gh 


and 


(22)  ^ht  = Eh[f  k(if\u^)Ir  (^)ds  + | k.(^,uj)l  (i^)ds], 

jQ  s s G s jQ  0 s s ,G  s 


where  E*1  denotes  the  expectation  under  initial  measure  and 

we  use  u*‘  = u^(£^).  We  often  write  t,^(s)  as  etc.,  for 

s s s 

typographical  simplicity . 

We  obviously  can  write 


(23) 


- h • h , f T , i h . * . h h . , 

5S  “ ‘•0  + J.IG<S)f(S'Us)d! 


r t 


L„  (5^)  Y (^,uj?)ds  + Bh(t)  + b“  ( t ) , 

OVj  S S 5 


+ 


where 


Bh(t) 

BS(t> 


ll  II 

Ir(5")  [dC  (s) 


0 

ftT  , - h . , , h 

1„  (i,  ) dt,  - 

n oG  s s 


- f t^,u^)ds]  , 
y (C^u^)ds]  . 

b s 


Denote  the  two  integrals  in  (22)  by  K (t)  and  Kg(t)  , resp.  , 

and  the  first  two  integrals  on  the  right  side  of  (23)  by  Q^(t) 

and  Qg(t)  / resp.  Let  Dm[0,°c’)  denote  the  space  of  Rm  valued 
functions  on  [0,‘°),  continuous  on  the  right  and  with  left-hand 
limits  (Billingsley  [9],  Lindvall  [10],  Kushner  [2],  Chapter  2), 
endowed  with  the  Skorokhod  topology.  Tf  a measure  v induces  a 
process  Xn ( • ) with  paths  in  D " [0 ,«)  w.p.l  and  { v } is  tight, 
we  abuse  terminology  and  say  that  {X  (•)}  is  tight.  If  { v } 
converges  weakly  to  a measure  v and  v induces  a process  X(-) 
with  paths  in  Dm[0,'»)  w.p.l,  we  say  that  { xn  ( • ) } converges 
weakly  to  X(*)-  We  occasionally  use  Skorokhod  imbedding  ([11], 
Theorem  3.1.1,  or  [2],  Chapter  2),  which  says  that  if  X (•)  ->  X ( • ) 

weakly  in  Dn‘[0*°°),  then  there  are  processes  X ( • ) , Xn  ( • ) with 

rr»  m 

paths  in  D"[0,°°)  and  which  induce  the  same  measures  on  D [O,00) 

as  do  X ( * ) , X (•),  resp.,  and  are  such  that  X*  ( • ) -*•  X ( • ) w.p.l 

in  the  Skorokhod  topology.  Since  all  our  limit  processes  will  be 

continuous  w.p.l,  this  implies  that  Xn(t)  - X(t),  uniformly  on 

bounded  intervals.  Also,  we  omit  the  tilde  ~ notation.  The 

following  theorem  follows  from  the  results  in  [4],  Section  8. 


Theorem  1 . + { S h ( • ) , Kh  ( • ) , kJ?  ( • ) , Bh  ( • ) , B !}(•),  Qh  ( • ) , (£(•): 

5r+2  u u u 

{ 4>  ( • ) ) } is  tight  on  D [ 0 , 00 ) , and  all  limits  have  continuous 

paths  w.p.l. 

We  will  next  characterize  the  limits  of  [b'1  ( • ) , Bg  ( • ) } . 

Let  us  choose  a weakly  convergent  subsequence,  also  indexed 
by  h,  and  henceforth  fixed.  The  subsequent  results  will  not  depend 
upon  the  selected  subsequence.  Denote  the  limit  by  £,(•)»  K(-), 

KQ  ( • ) , B ( • ) , Bq  ( • ) , Q ( • ) , Qg  ( • ) . By  construction,  B*‘(t)  and 


+Theorem  1 does  net  require  Al  or  A2  and  holds  whether  the  initial 
conditions  are  random  or  not.  It  needs  only  the  boundedness  and 

continuity  of  f,o,k,kg  and  Y ■ Also,  u'1  can  be  replaced  by  any 
pure  Markov  control. 


b“  ( • ) arc  martingales  (with  respect  to  the  o-algebras  l.‘‘ 

U 

duced  by  t,  , s < t)  and  an  easy  calculation  yields  that 


F,  sup|B":(t))  *•  constant-hT. 

t<T  U 

Thus  Bq ( • ) is  the  zero  process. 

The  quadratic  variation  of  Bn ( • ) is 

f r h h h 

I i u )ir(6ds, 

J 0 s G s 

where  £ (x)  is  such  that  it  converges  to  o(x)o'  (>:}  as  h -1-  0, 

, h i 4 

uniformly  in  x,  and  sun  E B (t)  ! < ■x>  for  each  t > 0.  Then 


0.  Then 


h 2 

{ ! B (t)  | }.  is  uniformly  integrable  for  each  t.  Let  96. 


denote 


the  o-algebra  induced  by  { £ , B ( s ) , K ( s ) , K ^ ( s ) , Q ( s ) , Q ( s ) , s < t}. 

Let  Nt  denote  an  t neighborhood  of  3G.  In  [3],  Lemma  1,  it 
is  shown  that  for  each  real  T > 0 there  is  a constant  K such 
that,  for  Interpolation  1 and  small  e > 0 

r T , 

(24)  EU  I (5“)  Ir  (-;“)ds  < K t , 

x jQ  s C s - i 

uniformly  in  u,h  (although  u did  not  appear  in  the  derivation, 

only  an  upper  bound  to  the  values  of  the  drift  function  f was 
used  in  the  derivation).  The  result  (24)  depends  only  on  the  fact 
that  the  component  of  the  diffusion  term  c(x)dw  orthogonal  to  the 
boundary  is  uniformly  non-degenerate  on  3G;  i.e.  on  (Al) . 

Estimate  (24)  also  holds  for  Interpolation  2,  and  is  crucial  for 
the  rest  of  the  development.  It  says  that  neither  the  approxima- 
tions nor  the  limit  can  "linger"  near  (but  not  on)  the  boundary. 

In  particular,  it  implies  that  the  probability  is  zero  that  over 
some  subinterval  of  [0,T]  the  paths  for  the  approximations  will 
be  in  N Hi  C.  and  the  limit  will  be  on  3G. 

Theorem^  2 . Assume  Al.  f B ( t ) , 9$t } is  a continuous  martingale 

ft 

with  quadratic  covariation  I„(t  ) c (£  ) c 1 (t,  ) els . 

u Jq  u s s s 

Proof . The  proof,  using  (24),  follows  similar  calculations  in  [2], 
[3],  [4].  Let  q^(t)  represent  any  of  the  vectors  ;n  'u  { ’ ) 

(see  Theorem  1),  let  n denote  an  arbitrary  integer,  t.,  i v n, 
numbers  less  than  or  equal  to  t,  let  s > 0 and  let  q(-)  denote 


a continuous  real  valued  function.  By  weak  convergence,  Skorokhod 
imbedding  and  the  uniform  integral-  ility  of  {|B^(t)|}  for  each  t, 
the  result  (martingale  property  of  B*1  ( - ) ) 


,,h  . h,, 
E g (q  ( ' 


n) [B“ (t+s)  - Bn(t) ] = Q 


implies 


Eg (q  (t j } , i < n)  [B (t+s)  - B(t)]  = 0. 


Also,  the  result 


r-bg  (nh  ( t± ) , i ^ n)  ! (Bh  ( t+s)  - Bh(t)  ) (Bh(t+S)  - B;1(t))' 


- f i h 


j 0 


VQ*  <*s,dsl 


, 4 

together  with  the  weak  convergence,  Skorokhod  imbedding  and  uniform 
integrability  of  [ | B*1  (t)  [ and  (24)  implies  that 


Eg (q  Ct i ) , i < n)  [ (B (t+s)  - B (t) ) (B (t+s)  -B(t))' 

rt 

- I Ic(*;s)  oUg)  a'  (£s)  )ds]  = 0. 

The  arbitrariness  of  g(*),  t,  t + s,  t^,  i < n,  and  n imply 
the  theorem.  Q.E.D. 


We  next  need  a representation  for  Q(-),  QQ  ( • ) , K ( • ) and  KQ  ( • ) . 
It  is  easy  to  see  that  all  these  functions  are  absolutely  continuous 
with  respect  to  Lebesgue  measure.  Thus,  there  are  measurable  (>»-,t) 
functions  q(*),  Oq(*),  k(-)  and  ^q(’)  such  that,  for  almost 
all  w , t , 


ft- 

Q(t)  = : q(s)ds, 


J 


0 


Q0(t)  = 


f t_  ft_ 

K (t ) = k ( s ) ds , Kn (fc)  = J k (s)ds. 

Jo  1 0 u 


*f*  h 2 

Actually,  uniform  integrability  of  { | B (t)!"}  (implied  by 

sup  E^|B^(t) j 4 < » is  not  needed.  Since  B(‘)  is  a square 
h 

integrable  continuous  martingale,  its  quadratic  variation  can  be 
obtained  by  a "localization"  of  the  argument. 


. Wo  can  now  proceed  in  two  ways,  either  working  with  generalized 
random  controls  or  by  imposing  a c onvexity  condition  and  using  an 
implicit  function  theorem.  We  take  the  latter  (and  easier)  approach. 

Theorem  3 . Assume  Al  and  A2 . Lot  f,k,kg,Y,o  be  continuous  and 
let  the  sets  { f ( x , a ) , k ( x , <* ) , a t } ; g ( x , ) and  { Y ( x , - ) , 

t 

k_{x,a),  a t ’•  g . (x,  ) be  convex  for  each  x.  Then  there 

is  a control  u(-)  with  values  u in  ^ when  G and  in 

- - — cj  — 2 _ . — — 

n when  i t oG  and  such  that,  for  almost  all  w,t, 

0 s — 


f0(t) 


k0(t) 


X-  > ^ 4_  ) ^ ^ l>  ) 

k-  C-  • ' Ij  U 


kc  { jt'  Jt.'  X,iG  (:'t} 


Troof . Define  g(t)  = (f(t),k(t))  and  gQ(t;  = ( fQ  (t)  ,k„  (t)  ) . 
The  proof  uses  the  basic  estimate  (2-5)  and  the  method  cf  [2], 
pp.  182-183.  By  (24)  and  [2],  pp.  182-183,  for  almost  all  -,t 


g ( t ) e %)!„{£,.) 

U t-  e. 


g0(t)  t g0(c,t,  «r0)l3G(^t), 


from  which  the  result  follows  by  the  McShane-Wer field  implicit 
function  theorem  as  in  [2],  Theorem  9.2.2.  Q.E.D. 

Summing  up  the  results  of  Theorems  1 to  3 , we  get  the  repre- 
sentation (under  Al  and  A2) 


ft  _ ft  _ 

(25)  + )f(t  ,u^)ds  + ; L 

t 0 J0G  s so  ]Q  ?G  £ 


)>  (S  ,u  )ds  + B(t)  , 


where  3 ( t ) is  a continuous  martingale  with  quadratic  variation 


LW°<V°,(5s)ds- 


Also,  there  is  a Wiener  process  •(•),  with  respect  to  which  .11  the 
other  processes  in  (25)  are  non-aaticipative  and  such  that 

ft 

B(t)  = j I (4s)a(«B))dw(S).  Obviously,  by  the  weak  convergence, 


is  in  G for  all  t.  Let  u denote  the  differential  generator 

associated  with  (25)  in  G . By  a slight  modification  of  the 

argument  associated  with  (40)  and  (41)  in  [3],  we  can  show  that 

4 ( * ) solves  the  sub-martingale  problem. 

Furthermore,  4(‘)  is  a stationary  process.  Let  .its  invariant 

v~ 

measure  be  denoted  by  'a,  (which  is  the  weak  limit  of  < }),  and 

— U 

let  y = lim  y“.  Then  the  distribution  of  t is  a . By  (22), 
h 

(24)  , 


(26)  >t  - E [ 1 (i  ) k (i  ,u  ) ds  + I.  (i  )k  U ,u  )ds]. 

J n G s s s n 0G  s 0 s f 


Remarks . ^'he  limit  process  C ( ‘ ) is  stationary,  as  is  the  drift 
f(*),  but  we  have  not  been  able  to  show  that  there  is  a Markov 
(reflecting  diffusion)  process  with  the  same  distributions.  There 
probably  is  such  a Markov  process,  as  there  probablv  is  a 
stationary  pure  Markov  control  u ( • ) such  that  u (•  . ) up  , t) 
w.p. 1.  In  any  case,  our  method  gives  much  information  on  the 
optimal  process  £(•);  e.g.,  the  multivariate  distribituions  of 
i (•)  converge  weakly  to  those  of  ,(•),  as  do  the  distributions 
of  any  bounded  measurable  functional  F(c(*)),  if  f(x(*))  is 
continuous  w.p.l  with  the  respect  to  the  measure  induced  by  i.  ;•)  . 
Indeed,  one  of  the  great  advantages  of  the  weak  convergence  method 
is  that  it  yields  such  information,  in  addition  to  approximations 
to  y.  Also,  > - average  cost  per  unit  time  for  •,(•),  and  is  the 
limit  of  the  average  costs  per  unit  time  for  the  sequence  of 
approximations . 

7.  optimality  of  the  Limit  i ( • ) • Being  a limit  of  optimal 
approximating  processes,  ■■>(•)  is  a good  candidate  for  optimality 
for  the  original  optimization  problem  (with  the  reflected  diffusion 
model) . Certain  optimality  properties  are  easy  to  show. 

Theorem  4 . Assume  A1  and  A2 . Let  v(*)  denote  a continuous 
stationary  ■ ure  Markov  control,  such  that  the  corresponding  re- 
fleeting  diffusion  (•)  is  unique  (in  the  weak  sense)  and  has  a 


unique  invariant  iv.ojsure 

v, 


.hen 


(where  we  let  the 


initial  measure  be 


u 


Proof.  Let 


■ h 
*n 


and 


»h(-  ) 


ienote  the  discretized  and  Interpolated 

Then 


processes,  resp. , corresponding  to  the  fixed  control  v ( • ) 
,v»h  Ll__  . -h 


the  cost 

optimality  of 
c h 

t (•).  Then  t 


for  the  interpolated  process  i* 


Let 


v,h 


denote  any  invariant  measure  . or 


weaklv  to 


(•) 


ana 


and  the  inv 
v 


u 


-esp. 


a riant  measures  it  converge 

as  h -*■  0 by  arguments  similar 


to  those  in  Theorems  1 to  2 


The  theorem  follows  from  this  and  (24, 

o . E . D . 


Since  wc  have  not  been 
stationary  pure  Markov,  1 
optimal  with  respect  to  a 
Theorem  4.  The  class  car. 
siderable  terminology  and 
where  broader  classes  of 
number  of  other  types  of. 


able  so  ar  to  prove  that  up)  is 
t would  he  nice  to  prove  that  u ( • ) is 
bro  ler  class  of  controls  than  those  in 
, c broadened,  but  at  the  expense  of  cen- 
detuil.  We  refer  the  reader  to  [2]  , 
comparison  controls  are  dealt  with  for  a 
opt  in  iz  a tier,  pr  ob  lems . 
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